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On the Projective Geometry of Kalman Filter 

Francesca Paola Carli and Rodolphe Sepulchre 


Abstract — Convergence of the Kalman filter is best analyzed 
by studying the contraction of the Riccati map in the space of 
positive definite (covariance) matrices. In this paper, we explore 
how this contraction property relates to a more fundamental 
non-expansiveness property of filtering maps in the space of 
probability distributions endowed with the Hilbert metric. This 
is viewed as a preliminary step towards improving the conver¬ 
gence analysis of filtering algorithms over general graphical 
models. 


I. Introduction 

This paper is about the asymptotic behavior of the Kalman 
filter [11]. The Kalman-Bucy filter merges predictions from 
a trusted model of the dynamics of the system with incoming 
measurements in order to get an accurate, real-time estimate 
of the unknown internal state of the system. The estimation 
relies on the computation of a positive semidefinite matrix 
P, the covariance of the estimation error. The difference 
equation verified by P is a discrete-time algebraic Riccati 
equation. Kalman showed that, for a linear time-invariant 
system, under detectability conditions, the Riccati equation 
converges to a fixed point, which is unique under certain 
stabilizability conditions ([10], see also [9]). The classical 
convergence analysis requires several steps, showing that the 
error covariance is upper bounded, that, with zero initial 
value, it is monotone increasing, so that it admits a limit, 
and then proving that the corresponding filter is stable and 
that the limit is the same for all initial covariances. 

In [4] Bougerol proposed a more geometric convergence 
analysis by showing that the discrete-time Riccati iteration 
is a contraction for the Riemannian metric associated to the 
cone of positive definite matrices. Other authors elaborated 
along these lines (see e.g. [16], [19], [13], [7]), showing 
that the Riccati operator is a contraction with respect to 
other metrics (e.g. Thompson’s metric) and providing explicit 
formulas for the contraction coefficients. 

In this paper, we seek to relate the convergence of the 
Kalman iteration, and, in particular, of the Riccati flow, to 
the contraction of the (projective) Hilbert metric under the 
action of a nonlinear map on the space of positive measurable 
functions (as opposed to the action of the nonlinear Riccati 
operator on the space of positive definite matrices). The 
choice of Hilbert metric seems to be particularly sensible 
in this context since, thanks to its property of being in¬ 
variant under scaling, it allows to study the convergence 
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of a nonlinear iteration via the analysis of a linear one. To 
this end, the Kalman iteration is seen as a specialization 
for Gaussian distributions of filtering algorithms for general 
hidden Markov models (HMMs) and the observation is 
made that the underlying iteration of these general filtering 
algorithms never expands the Hilbert metric. This approach 
is more general than the analysis of the Riccati iteration but 
at the price of a weaker result, since only non expansiveness 
of the Hilbert metric can be shown. The gap between non 
expansiveness and contraction is certainly a non trivial one 
in the infinite dimensional space of probability distributions. 
Using the Hilbert metric, convergence results have been 
proved in [1], [15] (see also [14] for some results concerning 
HMMs with finite state space) where problems arising from 
non-compact state spaces or heavy tailed distributions have 
been considered. We envision that this approach can open 
the way to a geometric analysis of filtering algorithms on 
general graphical models, e.g., of arbitrary topology. 

The paper is organized as follows. Section |II] and m 
establish common notation by introducing the Hilbert metric 
and the Kalman filter iteration. In Section m we show 
that the nonlinear iteration underlying filtering algorithms 
for general HMMs does not expand the Hilbert metric on 
the space of positive measurable functions. In Section |V] 
we show that the Kalman iteration can indeed be seen 
as a particularization for Gaussian distributions of forward 
filtering algorithms for general HMMs and as such does not 
expand the Hilbert metric on the space of positive measur¬ 
able functions endowed with the Hilbert metric. Section IVll 
discusses convergence. Section VII ends the paper. 

Notation. Throughout the paper if /C is a cone, we denote 
by the interior of K,. In particular we will denote by 
V CP"*") the cone of positive semidefinite (definite) matrices 
while T (J^"*") will be used to denote the cone of nonnegative 
(positive) measurable functions with respect to a suitable cr- 
algebra. 


H. Hilbert metric 

The Hilbert metric was introduced in [8]. Birkhoff [3] 
(see also [5]) showed that strict positivity of a mapping 
implies contraction in the Hilbert metric, paving the way to 
many contraction-based results in the literature of positive 
operators. The Hilbert metric is defined as follows. Let B 
be a real Banach space and let /C be a closed solid cone in 
B that is a closed subset JC with the properties that (i) /C"*" 
is non-empty; (ii) K. + 1C C 1C; (iii) JC D —1C = {0}; (iv) 
XICcIC for all A > 0. Define the partial order 

x^y^y — xGIC, 



and for x,y G IC\ {0}, let 

M{x, y) := inf {A|a: — Ay ^ 0} 
m(x, y) := sup {A|a: — Ay ^ 0} 


The Hilbert metric •) induced by JC is defined by 

du {x, y) log . x, y € A:\ {0} . (1) 

For example, if ;B = R" and the cone JC is the positive 
orthant, JC = O := {(a;i,... ,Xn) : a^i > 0, 1 < i < n}, 
then M(x,y) = max,{xi/yj) and m(x,y) = mmi{x,/y,) 
and the Hilbert metric can be expressed as 


d'H{^,y) = log 


max,(xi/yi) 
mini (xi/yt) 


On the other hand, if = 5 := {X = S is the 

set of symmetric matrices and JC = V := {^ ^ 0 |Xe5} 
is the cone of positive semidefinite matrices, then for 
X,Y ^ 0, M(X,Y) = A„,4 XY-i) and m(X,Y) = 
Xmin (XY~^). Hence the Hilbert metric is 


d„(X,Y)=log 


Xraa. (XY’!) 
Xrmn (XY"!) 


In the following, we will be interested to positive operators 
on finite measures. In this context, the Hilbert metric is 
defined as follows. Let X be a complete separable metric 
space and let X be the cr-algebra of Borel subsets of X . 
Moreover let S = V be the vector space of finite signed 
measure on (XjY) and JC = C(X) be the set of finite 
nonnegative measures on X. We recall that two elements 
A,y G C{'A) are called comparable if aX < p < f3X 
for suitable positive scalars a, (3. The Hilbert metric on 
C(X)\{0} is defined as 


dui.JJ-^JJ-') 


{ 1 

OO 


if p, p' comparable 
otherwise. 


An important property of the Hilbert metric is the follow¬ 
ing. The Hilbert metric is a projective metric on JC i.e. it 
is nonnegative, symmetric, it satisfies the triangle inequality 
and is such that, for every x,y G JC, d-u (x, y) = 0 if and only 
if a; = Ay for some A > 0. It follows easily that d-uix, y) is 
constant on rays, that is 


d'H{Xx,py) = d'H{x,y) forA,y>0. (2) 


Hilbert metric and positive mappings 

In this section, we review contraction properties of positive 
operators with respect to the Hilbert metric. We recall that a 
map A : /C I—)■ /C is said to be positive', a map A : JC'^ i— 
is said to be strictly positive. If A is a strictly positive linear 
map we denote by 

fc(A) := inf {A : d{Ax, Ay) < Xd{x, y) Vx, y, G /C*"} (3) 


the contraction ratio of A and by 

A(A) := sup {d(Ax, Ay) : x,y, G/C'*’} 


its projective diameter. Contraction properties of positive 
operators with respect to the Hilbert metric are established 
in the following theorem [3], [5], [12]. 

Theorem 2.1: If x, y G IC, then the following holds 
(j) if A is a positive linear map on IC, then du{Ax, Ay) < 
d-u{x,y), i.e. the Hilbert metric contracts weakly under 
the action of a positive linear transformation. 

{a) [Birkhoff, 1957] If A is a strictly positive linear map in 
B, then 

fc(A) = tanh^A(A) (5) 

Let U denote the unit sphere in B and let E be the metric 
space E := {JC'^ C U, d-u}■ Then, by combining Theorem 
|2.1| (ii), with the Banach contraction mapping theorem, the 
following generalization of the Perron-Frobenius theorem 
holds; if A (A) < oo and if the metric space E is complete, 
then there exists a unique positive eigenvector of A in E. 

HI. Kalman filter and the Riccati operator 
In this section, we briefly introduce the Kalman filter 
iteration, that is analyzed later on in Section |V] where an 
alternative derivation is also provided. 

Let us consider a linear dynamical system 

Xfc+i = AXfc + Wfc , A; > 0 (6a) 

Yfe = CXfe+Vfc, (6b) 

where {Wfe} and {V^} are mutually uncorrelated white 
noise Gaussian processes with variance F and S, respec¬ 
tively, i.e. 

Wfe^AA(0,r) Vfc~AA(0,S), (7) 

and with initial condition 

Xo^AA(/ro,Po) (8) 

such that 

E[WfcXj]=0, E[VfcXj]=0. (9) 

The Kalman filter recursion consists of the following steps: 

Time update (“Predict”) step: 

Xfc|fe-i = AXfc_i|fe_i (10) 

Pfc|fe-i =-A-Pfe_i|j,_iA^ + r (11) 

Measurement update (“Correct”) step: 

Xfe|fe = X,|,_i+Kfc(Yfc-CXfc|fc_i) (12) 

Pfe|fe = (I-KfcC)Pfe|fc_i (13) 

Kfe=Pfc|fe_iCT(CP,.|fc_iC^-(-S)-' (14) 

and is initialized at Xo|_i = /Tq, Po|-i = Pq- Equivalently, 
the following one-step expression for the a posteriori state 
estimate and covariance holds 

Pfc|fc =‘^’(Pfc-i|fe-i) (15) 

Xfcife = (A - Pfei^C^S-'CA) Xfc_i|fc_i 

+ Pfc|fcC^S-iYfe 

(16) 


( 4 ) 







where $ is the nonlinear map 

$(P) = (APA^ + r) 

[l + CTs-iCr + C^S-iCAPA^]’\ (17) 
$ in ( [T7 ] i can be written as 

$(P) = ((APA^+r)“VcTs-iC^ \ (18) 

This equation is called the discrete Riccati equation. In 
the literature, convergence of the Kalman iteration has been 
studied by proving that the discrete Riccati operator con¬ 
tracts suitable metrics (e.g. the Riemannian metric [4], the 
Thompson’s part metric [16]) on the set of positive definite 
matrices. In the following, we propose to study convergence 
of the Kalman iteration by directly analyzing an equivalent 
iteration on the space of positive measurable functions. This 
equivalent iteration will be introduced and discussed in the 
following section. 

IV. Non-expansiveness oe the Filtering 
Recursion in Projective Spaces 

In this section, we introduce the filtering algorithm for 
general hidden Markov models and we show that the map 
underlying the main iteration does not expand the Hilbert 
metric on the cone of positive measurable functions. Note 
that some authors use the term hidden Markov model exclu¬ 
sively for the case where takes values in a finite state 
space. In this paper, following e.g. [6], when referring to a 
hidden Markov model we also intend to include models with 
continuous state space; such models are also referred to as 
state-space models in the literature. 

Problem statement 

In the broadest sense of the word, a hidden Markov model 
is a Markov process that is split into two components: 
an observable component and an unobservable or “hidden” 
component. That is, a hidden Markov model is a Markov 
process on the state space X x Y, where we 

presume that we have a way of observing Y^, but not X^. 

In simple cases such as discrete-time, countable state 
space models, it is common to define hidden Markov models 
by using the concept of conditional independence. It turns 
out that conditional independence is mathematically more 
difficult to define in general settings (in particular, when the 
state space X of the Markov process is not countable - the 
case we are interested in), so a different route is adopted 
(see [6] for details). To this aim, we define the transition 
kernel (the parallel of the transition matrix for countable state 
spaces). 

Definition 4.1: (Transition kernel) A kernel from a mea¬ 
surable space (X, A) to a measurable space (Y,V) is a map 
Q : X X V —>■ [0, oo] such that 

(i) for all X G X, A i-A- Q(x, A) is a measure on Y; 

(ii) for all A G V. the map x i—)■ Q{x, A) is measurable. 

If Q(x, Y) = 1 for every x G X, then Q is called a transition 
kernel. 


We next consider an X-valued stochastic process {Xfc}j.>Q, 
i.e., a collection of X-valued random variables on a common 
underlying probability space (H,^,P), where X is some 
measure space. The process is Markov if, for every 

time fc > 0, there exists a transition kernel : X x A — 
[0,1] such that 

P(Xfe+i G A I Xo,...., Xfc) = Qfe(Xfc, A), 

for every A G A, fc > 0. If = Q for every fc, then 
the Markov process is called homogeneous. For simplicity 
of exposition, from now on we will consider homogeneous 
Markov processes, though the theory we are about to develop 
does not rely on this assumption. A hidden Markov model 
{Xfc, Yfcj^^p is a (only partially observed) Markov process, 
whose transition kernel has a special structure, namely it 
is such that both the joint process {Xfc,Yfc}j,>g and the 
marginal unobservable process {Xfc}j,>Q are Markov. For¬ 
mally: 

Definition 4.2: (Hidden Markov Model) Let (X, A) and 

(Y,[y) be two measurable spaces and let Q and G denote 
a transition kernel on (X, A) and a transition kernel from 
(X,A) to (Y,[y). Consider the transition kernel on the 
product space (X x Y, A 0 3^) defined by 

^r[(x, y), C] = g(x, dx')G{x', dy'). 

for (x,y) G X x Y, C G A 0 3^. The Markov process 
{Xfc, Yfc}^>p with transition kernel T and initial probability 
measure p on (X, A), is called a hidden Markov model. 

A hidden Markov model is completely determined by the 
initial measure /i and its transition kernel T (equivalently by 
Q and G), formally: 

Proposition 4.1: Let {X^, Yfe}^.>Q be a hidden Markov 
model on (Xx Y, X^y) with transition kernel Q, observation 
kernel G, and initial measure p. Then for every bounded 
measurable function / : X x Y —>■ R, 

E[/(Xo,Yo,...,X,.,Yfe)] 

/(Xo, yo, Xfc, yfe)G(xfc, dyfe)g(xfc_i, dxfe)... 
G(xi, dyi)g(xo, dxi)G(xo, dyo)fj.{dxo). (19) 

In the following, we are interested in the filtering problem 
for HMM, namely the problem of computing the sequence 
of conditional distribution of X^ given Yo:fe. The filtering, 
as well as the related smoothing and prediction problems, 
have their origin in the work of Wiener, who was inter¬ 
ested in stationary processes. In the more general setting 
of hidden Markov models, early contributions are the works 
of Stratonovich, Shiryaev, Baum, Petrie and coworkers [18], 
[17], [2], see also [6] for a recent monograph. 

Filtering algorithm 

Assume that both G and Q are absolutely continuous with 
respect to the Lebesgue measure (in the next section we 
will particularize to the case of Gaussian distributions) with 
transition density functions g and q respectively. In terms of 


transition densities, the filtering problem can be solved as 
follows. 

Theorem 4.1 (Forward filtering recursion): We denote 
by Q;fc(x/c) the probability density function 


and let 


as(xfc) :=p(xfe|yo:s) 


ff(xfe,yfe) = 5'fe(xfc). 


Then Q!fc(xfe) = p(xfe | yo:fc) can be recursively expressed in 
terms of ak-i{xk-i) = p(xfe_i | yo:fc-i) as follows 


Q!fc(x/c) = 


gfc(Xfc) / g(Xfc_i,Xfc)Q;fc_i(Xfc-i)dXfc-i 

// 5fe(xfe)q(xfe_i,xfc)dfc_i(xfe_i)dxfcdxfe_i 

( 20 ) 

with iteration initialized at 


ao(xo) = 


5o(xo)m(xo) 


( 21 ) 


/go(xo)Ai(xo)dxo' 

The iteration ( |20l l defines a time-varying dynamical system 
over the cone T of nonnegative measurable functions with 
respect to the product cr-algebra Af The follow¬ 

ing equivalent two-step formulation holds. 

Remark 4.1: [Two-step formulation of the filtering 
recursion] The filtering recursion ( |20| ) is often split into two 
steps. 

1) prediction step: in which the one-step-ahead predic¬ 
tive density is computed 

= ( 22 ) 

2) update step: in which the observed data from time k 
is absorbed yielding to the filtering density 

fffe(xfe)dfc i(xfe) 


afc(xfc) = 


ff gk(xk)g(xk-i,xk)dk-i(xk-i)dxkdxk-i 

(23) 


Non—expansiveness in projective space 

First of all, notice that the nonlinear map in ( |20| l, say ^k, 
is the composition of a linear one (at the numerator) and a 
positive scaling, i.e. we can write 

\^kj j r/iTf 

J(T'fc/)(x)dx 

where 

(T'fe/)(x) = gfe(x) j q{x',x)f{x')dx' (24) 

with q and g transition densities associated to the transition 
and observation kernels Q and G, respectively. The next 
theorem draws the consequences of the fact that the map 
'$k takes nonnegative measurable functions into nonnegative 
measurable functions. 

Theorem 4.2: The map fit;, in ( |24l l does not expand the 
Hilbert metric, i.e. 

duii'^kDix), (T'fcg)(x)) < duifix), p(x)). 

Proof: The map fitfe is the composition of (i) 

(4/(1)/)(x) = Jq{^',x)f{x')dx' and (ii) (4/(2)/)(x) = 


gk{x)f{x). The maps 4/(i) and 4/(^) are positive linear 
and as such they do not expand the Hilbert metric (see 
Theorem |2.1| (i)). The thesis follows since the composition 
of nonexpansive operators is nonexpansive. ■ 

V. Kalman filtering as Forward Filtering 
Recursion 

The classical derivation of Kalman filter relies on an argu¬ 
ment based on projections onto spaces spanned by random 
variables. As an alternative, the Kalman iteration can be seen 
as a specialization of the filtering algorithm in Theorem |4.1| 
for Gaussian distributions. This fact by itself is known in 
the literature (see e.g. [6]). In this section, first we briefly 
review this alternative derivation of Kalman filtering. This, 
combined with the (weak) contraction result of Theorem |4.2| 
let us conclude that the Kalman iteration does not expand 
the Hilbert metric. Convergence of the Kalman iteration is 
discussed in Section IVll 

Before getting started, we observe that the linear dy¬ 
namical system is indeed equivalent to a hidden 

Markov model as specified by ( [T9] ) with initial, transition 
and emission probability densities, for fc > 0, given by 


p(xo) =Af{po,Po), 

(25) 

p{xk+i \xk)=Af (Axk, r), 

(26) 

p{yk 1 Xfc) = A/'(Cxfc,S), 

(27) 

Also we recall that given the prior and likelihood 

p{x) =N{px^'^x) 

(28) 

p{y 1 x) = A/'(Ax -h b, Sy|x) 

(29) 

the posterior p{x \ y) and normalization constant p{y) are 

given by 

p{y) — '^Y\x + 

) (30) 

p{x 1 y) = 

(31) 

with 

'^x\Y = '^x + A^Sy^^A 

(32) 

P'X\Y = '^x\Y — b)-(- 

• (33) 


The next proposition connects the Kalman filter algorithm 
to the filtering recursion described in Section jrvj 

Proposition 5.1: The Kalman filter recursion ([T0li-([l4]) 
is a specialization of the forward filtering recursion of 
Theorem |4.1| for an HMM with Gaussian initial, transition 
and emission probabilities as in 
Proof: Let 


Mfc|s ■“ ^ I Yo:s] , 

Pfc|s ■ ()^fc Mfc|s) I '^0:s 

1) prediction step: By (j^, p{xk\yo-.k-i) is given by 


p{xk\yo-.k-i) = / p{xk\xk-i)p{xk-i\yo-.k-i)dxk-i 

•/Xfc-l 










Now, p(xfc|xfc_i) is Gaussian with mean Axfe_i and 
covariance F. p(xfc_i|yo:fe_i) is also Gaussian. We 
denote by and Pfc_i|fe_i its mean and co- 

variance, respectively. By virtue of we get 

p(xfc|yo:fe-i) APfc_i|fc_iAT + F) 

i.e. 


Pfc|fe-1 — 

Pfc|fe-i = APfc_i|fc_iA^ -f F 


which are the a priori state estimate and covariance in 

([Tgi-([n]). 

2) update step: By ( |^ , p(xfc|yo:fc) is given by 

/ I N P(yfc I Xfc)p(xfc|yo:fc_l) 
p(x/c yO:fc) = -7-j-7- 

PKYk \ yo:fe-i) 


Now p(yfe I Xfc) is Gaussian with mean Cx^ and co- 
variance S. p(xfc|yo:fc_i) is also Gaussian. We denote 
by and Pfc|fe_i its mean and covariance. By 

virtue of ( [3T] i we get 

p(xfc|yo:fc) -- M (/Tfelfe, Pfc|fe) 

with 


Pfc|fc — 


( P“^ 


k\k-l 




-1 




= P 


k\k 


C S’ 


Vk + V 


(34) 

(35) 


from which the expressions ([T^-([T3]l for the a pos¬ 
teriori state estimate and covariance can be recovered 
via the matrix inversion lemma. 


By the results in Theorem 4.2 and Proposition [5T] we have 
that the map underlying the Kalman filtering algorithm does 
not expand the Hilbert metric on space of positive measurable 
functions. 


VI. On strict contractiveness of the Kalman 

ITERATION 

So far, we have shown that the time-varying nonlinear 
operator that underlies the Kalman iteration does not ex¬ 
pand the Hilbert metric. Proving convergence of the Kalman 
iteration indeed amounts to prove that such iteration strictly 
contracts the Hilbert metric. As observed in Section |IV] 
the map ( [20l i is the composition of a linear positive map 
and a positive scaling. By the scaling invariant property of 
the Hilbert metric, it follows that convergence analysis can 
concentrate only on the linear numerator of By Theorem 
2.1 (ii), a sufficient condition for a strictly positive linear 
operator to be a contraction is to have a finite projective 
diameter. At this point, one may observe that even the Hilbert 
distance between two Gaussians with the same variance and 
different mean may tend to infinity (a general discussion 
that takes into account problems arising from the use of the 
Hilbert metric with non-compact state space and heavy tailed 
distributions is contained in [1]). Proving strict contraction 
usually requires to exploit that the map is time-varying. 


and showing that the map contracts over a uniform time- 
horizon as opposed to at each time instant. For iterations on 
the finite dimensional space of covariance matrices, this is the 
place where the observability and controllability conditions 
enter the analysis. Our hope is that similar conditions 
apply to more general situations that the one covered by the 
Kalman filter and that this general approach will find novel 
applications in the analysis of filtering algorithms on general 
graphical models. 


VH. Conclusion 

As an attempt to generalize the contraction-based con¬ 
vergence analysis of the Kalman filter, we have interpreted 
the contraction result of Bougerol in the space of positive 
definite (covariance) matrices as a specialization of the non- 
expansiveness of the general filtering recursion for hidden 
Markov models in the space of positive measurable func¬ 
tions. In spite of the obstacles to showing a finite projective 
diameter in this infinite dimensional space, we feel that this 
approach is worth revisiting in the convergence analysis of 
filtering algorithms on general graphical models (arbitrary 
topology and/or on different spaces of distributions). This is 
the topic of ongoing research. 
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